The copy number and mutational landscape of recurrent ovarian high-grade serous carcinoma

The drivers of recurrence and resistance in ovarian high grade serous carcinoma remain unclear. We investigate the acquisition of resistance by collecting tumour biopsies from a cohort of 276 women with relapsed ovarian high grade serous carcinoma in the BriTROC-1 study. Panel sequencing shows close concordance between diagnosis and relapse, with only four discordant cases. There is also very strong concordance in copy number between diagnosis and relapse, with no significant difference in purity, ploidy or focal somatic copy number alterations, even when stratified by platinum sensitivity or prior chemotherapy lines. Copy number signatures are strongly correlated with immune cell infiltration, whilst diagnosis samples from patients with primary platinum resistance have increased rates of CCNE1 and KRAS amplification and copy number signature 1 exposure. Our data show that the ovarian high grade serous carcinoma genome is remarkably stable between diagnosis and relapse and acquired chemotherapy resistance does not select for common copy number drivers.

In addition to HaplotypeCaller filters, further filtering was performed by two unique functions within the ampliconseq pipeline which models dataset noise: The first models substitution specific noise at a specific locus for all libraries within a single sequencing run. The second models noise within individual libraries. Thresholds are determined based on modelled beta distributions using quantiles corresponding to a probability of 0.9999. All called variants below these two library and position specific noise thresholds are discarded. Variants which were not detected in both technical replicate libraries, or did not pass all quality controls filters for both technical replicates were discarded.

Octopus
Germline variants were called with octopus using the individual calling mode. Variants were called in this mode using default parameter thresholds with the -disable-downsampling, -allow-marked-duplicates and -allow-octopusduplicates flags all activated during variant calling. Called variants which did not pass any of the default parameter hard thresholds were discarded.
A second round of filtering using version 0.7.2 of the octopus germline random forest model was also implemented. Variants with a random forest predicted genotype quality score below the default threshold (i.e. 3) were discarded. Variants in which the called predicted genotypes for both technical replicates were discordant were also discarded.

Germline variant post-processing
Variants called from either the ampliconseq pipeline or octopus were functionally annotated and filtered as described below (see Variant annotation section). Variants identified through clinical testing (even for some patients without germline TAm-Seq sequencing) are detailed in the supplementary data 1, containing a table of short variants. By default, variants called using the octopus algorithm were accepted. Variants called using HaplotypeCaller required additional evidence in order to be accepted into the final call set. This could be either concordance with octopus variant calling or concordance with the results of clinical testing. All instances of the BRCA2:p.T3033Lfs*29 were removed from the final call set for the following reasons: generally low mutant allele fraction, frequent failing of QC filters when all putative variants were considered together, and additionally the failure of this variant to pass QC filters when variants were called jointly between normal and tumour samples.

Tumour Sample Variant Calling
All variant calling on tumour samples were performed using the cancer calling mode of Octopus (v0.7.2) 3 with the exception of the TP53 variants which were used to guide copy number calling, which were as reported previously 4,5,6 ), although TP53 variants were also called independently for this study: Variant calling was performed on tumour samples in two primary modes which will be described here individually -a TP53 and a non-TP53 calling mode: TP53 somatic variant calling TP53 variants are called separately due to their unique role as necessary and ubiquitous drivers of tumourigenesis in high grade serous carcinoma 7 , and the presence therefore of an identical TP53 mutation being expected in all tumour samples from the same patient. For TP53, variants were called on each tumour sample individually.
Octopus was called in cancer calling mode with the following options: Downsampling of reads was disabled using thedisable-downsampling flag and both aligner and Octopus recognised read duplicates were not removed using theallow-marked-duplicates and -allow-octopus-duplicates flags respectively. Expected somatic mutation frequencies were set at 0.03 and 0.01 using the -min-expected-somatic-frequency and -min-credible-somatic-frequency flags respectively. Octopus will attempt to classify called variants as germline or somatic, and only variants classified as somatic were retained using the -somatics-only command line flag. Somatic variants were further filtered using default hard filter parameter value thresholds with the exception of the AF parameter being set to 'AF < 0.03' for thesomatic-filter-expression flag. Variants which were not detected in both technical replicates for a given sample were discarded.
The set of amplicon regions for TP53 variant calling were set to the union of genomic ranges specified in amplicon panels 1, 10 and 28 (as described in supplementary data 5).
Due to their often being multiple TP53 variants detected per patient, all TP53 mutations identified per patient were classified as suspected driver or non-driver mutations. A combined ranking and scoring process was conducted for each patient's set of identified TP53 variants. The mutation designated as the likely driver mutation for each patient was selected as being the representative TP53 mutation for the construction of oncoprints reported in this study (i.e. Figure 2, Figures S26 and S27).
More precisely and formally, the scoring process for each patients set of TP53 variants were as followed: Where is a unique list of TP53 mutations identified per patient indexed from 1, . . . , for unique identified TP53 mutations per patient. '() , *"('+,& and -.(, are all descending rankings of by the mutant allele fraction, number of tumour samples in which the variant was located, and the variant quality score respectively such that the set of elements of '() , *"('+,& and -.(, are all equal to {1, . . . , }. +&*(,/0 accounts for potential batch effects by penalising variants which do not appear in both the diagnosis and relapse tumour classes such that +&*(,/0 ∈ {0,12}. The suspected TP53 driver mutation is labelled using the variant corresponding to ( "#$%& ).

Non-TP53 somatic variant calling
Non-TP53 variants were called using four different methods: unmatched and paired, matched and paired, matched and unpaired and unmatched and unpaired analysis modes. In this context, matching refers to analyses supported with a matched normal, non-tumour whole blood sample for the same patient. Pairing ( Figure 2) refers to the use of both diagnosis and relapse tumour types for a particular analysis mode. Matched analyses are of increased confidence and can be used to confidently classify variants as germline or somatic ( Figures S26 and S27), however matched analyses occur with restricted sample sizes due to the extra requirement of available sequenced non-tumour samples.
Tumour samples which were suspected of low cellularity (either from the copy number analysis or from pathology reported predictions of sample cellularity) were not assessed for somatic variants. Samples with failed sequencing in any amplicons of the TP53 gene (i.e. coverage <100 for either technical replicate) were also not assessed for variants as these samples were not judged to have met quality control standards.
For all four non-TP53 somatic variant calling modes, joint variant calling was performed for all available tumour samples (and non-tumour samples where applicable) for a given patient. After core variant calling and filtering using octopus, a post-hoc quality score of 500 was applied. All variants in which the variant did not appear in both technical replicates for a sample were discarded. All variants were passed through additional fixation artefact and variant functional annotation filters as described in below sections (fixation artefact correction and variant annotation).

Unmatched and unpaired analyses
Octopus was executed in cancer calling mode with --min-expected-somatic-frequency set to 0.03 and --min-crediblesomatic-frequency set to 0.01. Additional flags passed were as follows: -disable-downsampling, -allow-markedduplicates and -allow-octopus-duplicates.
The first set of variant filtering implemented both germline and somatic random forest models (v0.7.2), in which all variants not passing the random forest filter were discarded. A second round of variant filtering was applied using default threshold values with the exception of 'QUAL < 10' for -filter expression and 'AF < 0.01' for -somatic-filterexpression.
Variants which appeared at high frequencies in the study group were cross-referenced against the results of the corresponding matched and unpaired analysis. If the variant appeared at a much reduced frequency in the matched and unpaired analysis, then the putative recurrent mutation was regarded to be spurious. This was the case for two variants: BRCA1:p.Lys654SerfsTer47 and BRCA2:p.Thr3033LeufsTer29.
A known limitation of the octopus algorithm is that when calling variants in an unmatched analysis, germline variants with inflated MAFs due to events inducing loss of heterozygosity erroneously fail some hard filter thresholds (in particular the AF and AFB filters). To counteract this limitation of the algorithm, known germline mutations from the germline analysis were added post hoc to the set of variants called in unmatched analyses.

Unmatched and paired analyses
The aim of this analysis is to determine which variants are shared or exclusive to the given diagnosis and relapse tumour classes. This analysis mode occurs downstream of all analysis steps described for the unmatched and unpaired analyses described above: Targeted variant calling (also referred to as specific variant calling) was performed on all variants discovered as part of the unmatched and unpaired analysis, although this time with relaxed thresholds for some hard filters, more specifically: 'AF < 0.001' and 'AFB < 0.50' for the -filter-expression command-line flag. Relaxing parameter thresholds at this stage increases sensitivity for putative shared variants between the diagnosis and relapse tumour classes without substantially reducing specificity more generally for all assessed genomic loci. Variants which were discordant between diagnosis and relapse tumour samples were manually assessed using IGV in order to determine if variant detection, or lack of detection occurred due to algorithm error.

Matched and unpaired analyses
For matched analyses, non-tumour bam files were supplied via the -N flag when calling octopus. As in previous analysis modes, --min-expected-somatic-frequency was set to 0.03 and --min-credible-somatic-frequency was set to 0.01.
The first round of variant filtering was conducted using both the germline and somatic random forest models (v0.7.2) supplied as part of octopus. The default threshold of 3 for the RFGQ_ALL parameter was applied. A subsequent round of filtering was applied using default hard filter thresholds except for 'QUAL < 10' for -filter-expression and 'AF < 0.03' for the -somatic-filter-expression flag.

Matched and paired analyses
Targeted analyses were performed as described in the unmatched and paired analysis section. The results of this analysis are reported below: Due to differences in methodology, the results reported in Figure 2 are not expected to be perfectly aligned/concordant with the results reported in Figures S26 and S27. The main differences in methodology that explain these differences are as follows: i) Figures

Fixation artefact correction
In order to identify any potential substitution-specific artefacts, suspected artefacts in which a mutation appeared in only one of two technical duplicates were counted, and MAF density estimates were produced. As previously reported 8 , a large enrichment of C>T transitions were identified in formalin fixed tissue compared to tissues preserved using different methods (e.g. UMFIX fixation in this study). Additionally, MAF density estimates for C>T substitutions from formalin fixed samples were shifted to the right compared to those not fixed with formalin indicating that when such artifactual transitions were detected, they were present to a greater extent in formalin fixed samples. The total number of artifactual mutations of this type detected in formalin fixed samples were also greater.
An additional MAF threshold of 0.23 was implemented for C>T substitutions (and correspondingly cognate G>A substitutions) as a result. In DNA samples with particularly poor quality DNA due to formalin fixation, artefact MAFs are inflated 9 leading to highly discordant MAFs between technical replicates. As a result, an additional C>T/G>A filter was also implemented for variants in which MAFs differed by more than 0.30 in order to remove further artefacts.
Only annotations for a gene's canonical/representative transcript (as determined by VEP) were considered. Variants which were detected within genes within a short range upstream or downstream of the target gene were also discarded. Variants were further refined using the molecular tumour board portal (MTBP) 11 . More specifically, variants labelled as benign or likely benign by MBTP were discarded.

Inference of sample mislabelling events
Two different approaches were used and combined in order to identify putative sample mislabelling events. Firstly, the concordance between libraries belonging to normal and tumour samples deriving from the same patient were tested using a modified version of the HaveYouSwappedYourSamples method 12 .. Namely, pairwise concordance scores were calculated for all normal-tumour library pairs by determining the proportion of high MAF variants (as determined by HaplotypeCaller) which were shared between each library pair. A threshold was then applied in order to classify library pairs as being either concordant or discordant. Normal-tumour sample pairs were classified as potentially discordant if they did not contain any expected concordant library pairs. Given the sparsity of the genomic information obtained from the amplicon sequencing data, no sample swap events could be determined with high confidence.
Secondly, a sample was identified as potentially being mislabelled if it contained what appeared to be a high MAF TP53 mutation that was discordant for other TP53 mutations classified as 'driver' for that patient.

Modified QDNAseq implementation
QDNAseq was modified to allow for read counts to be corrected for GC and mappability whilst being able to transform data back into read count space. To do this, several modifications were made to QDNAseq. In the correctBins() function the following line was altered. After bin correction is performed using QDNAseq correctBins() function, the following transformation is applied to the binned copy number data to correct by the estimated bin correction generated by the estimateCorrection() function.

Profile fitting
A grid search was performed across ploidy and purity ranges to estimate absolute copy number profile fitting. A range of quantitative and qualitative metrics were used to select the correct ploidy-purity combination from a single or set of best absolute fits for a given copy number profile. Samples with multiple equally likely fits were assessed independently by two investigators to select the best fit or exclude a sample from downstream analysis. Discordant assessments were discussed until a consensus was reached.
Sample fits were also subject to fit 'power' calculations, in which fits with insufficient reads to support the selected ploidy-purity combination were excluded from selection. Sufficiently 'powered' fits after quality control underwent downsampling to a fixed read depth of 15 reads per bin per tumour copy which acts to normalise inter-sample and intra-patient variance caused by varying coverage between samples. This manifests as an increased or decreased standard deviation in the bins associated with a given segment. Downsampled absolute copy number profiles were then generated and selected in the same manner as described previously, with selected fits used for downstream analysis.
Lastly, selected downsampled fits were subject to profile variance filtering. While the downsampling process seeks to reduce the amount of intra-patient and inter-sample bin variance, some absolute copy number profiles still retain a high degree of bin variance across their selected copy number fit. The standard deviation for bin values across copy number states 1, 2, 3, and 4 where calculated and samples with standard deviations of all bin values across all copy number states exceeding 3 standard deviations above the mean were removed, excluding four samples from downstream analysis (IM_159, IM_181, JBLAB-19324, JBLAB-4121).

Copy number event calling
Copy number events were defined as a given segment in a copy number profile, under the assumption that each segment called as a CN event is independent of other neighbouring segment changes. For the analysis of CNA (both focal/gene-level and broad) copy number thresholds were used as defined by COSMIC and Allele-specific copy number analysis of tumours 13 .
Average genome ploidy ≤2.7 -Amplification: total copy number ≥5 -Deletion: total copy number = 0 Average genome ploidy >2.7 -Amplification: total copy number ≥9 -Deletion: total copy number < (average genome ploidy -2.7) For gene-level extending across more than one 30 kb genome bin, a mean was taken of all intersected bins. Broad events are defined on the basis of the proportion of affected cytoband with a threshold of 80% called as either amplified or deleted. For arm-level events a threshold of 50% of a chromosome arm (as a proportion of supporting bins) was selected. Copy number events which were equal to plus or minus one from a sample ploidy, but not called as amplification or deletions, were termed gains and losses, respectively.

Ploidy changes
Ploidy changes are difficult to assess due to the nature of absolute copy number fitting, where an incorrectly selected ploidy-purity combination would appear as change in ploidy between the diagnosis and relapse samples. Here, the patients with suspected ploidy changes underwent a scoring methodology to assess the likelihood of a true ploidy change versus a technical error during absolute profile fitting. Change in ploidy between samples was defined as; Where is the patient, is the ploidy of relapse samples for patient , and is the ploidy of diagnosis samples. Where multiple samples occur for a given sample group, a median value is taken. An absolute value of one or greater is defined as a patient with a ploidy change.
Ploidy change patients were assessed on the basis of three criteria to determine the confidence to which a ploidy change is likely to be true rather than as a consequence of erroneous or poor quality copy number fitting. These criteria are; 1) Selected fits have the highest scoring quantitative quality metrics compared to other sufficiently powered copy number fits (clonality error, TP53 estimate). 2) No underpowered fits with otherwise acceptable quality metrics are available which would contradict the selected copy number fit. 3) Additional samples, attributed to either diagnosis or relapse groups, support the ploidy change by also conforming to criteria 1 and/or criteria 2.
Meeting any of these criteria provides a given patient ploidy change with one star, with a maximum of three stars for a ploidy change with the maximum confidence. Patients with ploidy change and the assigned rating are detailed in Table S8.
For patient-specific analyses (patient loci clustering & gene correlation and heatmap), these ploidy change samples were excluded from the analysis due to the impact on patient clustering. This can be visualised in Figure S28, where ploidy change patients constitute a large proportion of the more extreme copy number changes between diagnosis and relapse.

Purity differences
As expected, we observed differences in tumour purity values for absolute fitted copy number profiles across different biopsy sites and fixation methods, but purity was still consistent between diagnosis and relapse (p=0.

Intra-tumour heterogeneity
As implemented by van Dijk et al. 14 , copy number heterogeneity (CNH) is calculated as the minimisation of segment distance from integer state using a ploidy-purity grid search over segment , where; Where d is the absolute distance of a segment from an integer defined as; Where q is the absolute copy number of a segment, α is the sample purity, τ is the average sample ploidy, and w is the segment width.
Our implementation forgoes performing a ploidy-purity grid search to determine the lowest chromosomal copy number heterogeneity across as ploidy and purity values for a given sample have already been determined during absolute fitting. As such we calculate CNH (hereto referred to as intra-tumour heterogeneity; ITH) as; Where d is the absolute distance of a segment from an integer defined in equation 2. Noisy segments were excluded as described by van Dijk et al. 14 using the standard deviation of the mean (σμ) bin distributions across each segment.
Noise thresholds were set at 2 standard deviations greater than the mean noise, where cutoffs were set to a threshold of σμ > 1.48, and σμ > 0.875, for segments and samples, respectively. Noise thresholds removed nine samples (3.4%) and 685 segments (1.37%, mean and median of 2.59 and 1.00 segments per sample, respectively). After sample exclusion, segment filtering removed 467 segments (0.97%, mean and median of 1.82 and 1.00 segments per sample, respectively). ( Figure S29).

Copy number signature abundance modelling
Partial ILR-Bernoulli model Compositional data are defined by their sum constraint (exposures add up to one) and positivity (exposures are equal or larger than zero), therefore, any regression methods used to analyse them have to be appropriate for a multivariate compositional response. The basis of compositional data analysis 15 is that a compositional vector of length d can be transformed to an unconstrained vector in R^{d-1} without loss of information, and removing the sum-constraint.
Here, as we have described previously 16 , we use the Isometric Log-Ratio (ILR) transformation 17 , in which we use an orthonormal basis to transform the data.
A further challenge in copy number exposure data is the presence of zero values. We address it by using a variant of the ILR transformation, the partial ILR, in which only non-zero values are taken into account. The presence or absence of signatures is analysed using a Bernoulli model. We introduce mixed effects in both models to capture the information about paired diagnosis and relapse samples. The models are implemented in Template Model Builder 18 and run through R.

Model interpretability
The transformation of compositional data is adequately explained as follows; instead of analysing signatures s1 through s7, we analyse the following signature comparisons after ILR transformation of s1 vs s2, s3 vs the mean of s1 and s2, s4 vs the mean of s1-s3, and so forth. This leads to a total of six pairwise comparisons ( Figure S30). The means of comparison is by taking the log-ratio of signatures (or groups of signatures), and we use the geometric mean to group signatures.
The two parameters of interest in the model are the intercept and the slope. Both are vectors of length 6 (i.e. as many as comparisons). The intercept indicates, in transformed space, the abundance of signatures in the first group of samples. This intercept can be transformed back to compositional data (using the inverse ILR transformation) to get the mean abundance of signatures in the first group. The slope is the difference in signature abundance between the groups, in transformed space. Therefore, the sum of the intercept and the slope gives us the mean abundance of signatures in the second group, in transformed space. A slope of zero indicates that the exposures are not different between groups.
As an example, Figure S31 shows the intercept and slope for a scenario in which there are three mutational signatures (therefore, they are both vectors of length two). For the intercept, the first ILR is close to zero, indicating that the logratio between s1 and s2 is close to zero, and that therefore the mean abundance of s1 and s2 in the first group is roughly the same. ILR2 is negative, indicating that the abundance of S3 is lower than the geometric mean of s1 and s2. For the beta slopes, that of ILR1 is slightly negative, indicating that the ratio between s1 and s2 is a bit lower in the second group than in the first group. The slope for ILR2 is positive, indicating that s3 is more prevalent in the second group.

Sample preparation
Tissue microarrays (TMA) were created using 1mm cores from viable archival formalin-fixed paraffin-embedded blocks of archival samples of the BriTROC study. Three representative cores were taken from each block. 3μm sections of the TMA blocks were cut using a Leica microtome, transferred to a water bath pre-heated to 60°C and collected on SuperFrost Plus glass slides. The sections were dried overnight at 37°C then baked at 60°C for 1h to remove excess wax.

Automated Staining
Automated staining was carried out using the Ventana Discovery Ultra platform. All bulk reagents used were purchased from Roche. Sections were rehydrated by incubating them in EZ prep solution for 32min at 69°C, then antigen retrieval was performed by incubating the sections for 1h in Ventana Cell Conditioning buffer 1 (CC1) at 96°C (pH 8.5). After 4min incubation in hydrogen peroxide anti-human primary antibodies were applied and incubated for 1h at 37°C (CD3, Roche 790-4341, prediluted; CD8, Spring Bioscience M5394, 0.5ug/ml; Pan-Keratin, Roche 760-2135, prediluted). After a 16 min incubation with an HRP-conjugated secondary antibody, the chromogenic signal was developed in either 3, 3'-Diaminobenzidine (DAB) chromogen for 8 min, in Purple chromogen for 40 min, in Yellow chromogen for 28 min or in Teal chromogen for 8 mins. For chromogenic counter-staining, the slides were incubated for 4 min in Copper, 8 min in haematoxylin and 4 min in Bluing Reagent. Stained slides were dehydrated using the Leica Autostainer ST020, manually cover-slipped and digitally scanned by Aperio Scanscope XT. Antibodies were validated in the Cancer Research UK Cambridge Institute Pathology Core by staining human lymphoid tissue and confirming appropriate staining location with a pathologist. As the markers are CD8 and CD3. A non-lymphoid tissue was used as a negative control. The antibodies were further validated by multiplex staining, where the overlap between CD3 and CD8 was confirmed.

Image analyses
Analysis of IHC images was done using the HALO Image Analysis Platform (Indica labs). Nuclear segmentation was performed using haematoxylin staining intensity, and cell margins were defined using watershed. The number of cells positive for CD8, CD3, FOXP3 or CD20 was quantified by setting a minimum intensity threshold for each chromogen, and the density was then calculated per area of tissue (Global density). Immune cell subtypes were obtained defined as: CD8+ (all CD8+ T cells); CD8-(CD3+ CD8− T cells); CD3 (The total density of CD3+ cells, inferred from the sum of CD8+ and CD8− cell densities). Images were classified based on pan-keratin staining pattern into tumour (panK+) and stroma (panK-), and each cell density was calculated per region of interest. To test the accuracy of the tissue classifiers, 237 classified images were scored by a pathologist; a score of 1 was given for >90% accuracy, 2 for 70-90% and 3 for < 70% accuracy. 76% (n=179) of images scored 1, 16% (n=39) scored 2 and 8% (n=19) scored 3, indicating fair accuracy of the classifiers. Global (GD), tumour (TD) and stromal (SD) density of each cell type were obtained by dividing the cell count in that location by the area of the same location.

Quantification of tumour immunohistochemistry markers
After quantification of marker positive cells across fixed image areas, grouped by stromal, tumour, and all tissue, cell counts were normalised to a marker-positive cells per micrometre squared (cells/μm 2 ) and were further rescaled into log(1+ X) where X is cells/μm 2 to account for extreme positive counts and zero counts for each image.

Figure S2. Germline SNVs and short indels identified in key homologous recombination pathway genes
Germline DNA extracted from whole blood samples from 228 BriTROC-1 patients was tested for short variants in key HR genes. Each column represents one patient, colour coded to denote patient platinum sensitivity status at study entry. The lower legend denotes variant type. FANCM and BARD1 were also tested, but no mutations were identified for any patient

Figure S3. Whole cohort-level detection of SNVs and short indels in key cancer related genes (unpaired) DNA samples extracted from all tumour samples (both diagnosis and relapse) from 265 patients were tested for short variants in 20 relevant cancer genes. Mutations were not classified as somatic or germline in this analysis nor classified by relapse status (diagnosis vs relapse). Samples were not matched with corresponding normal
DNA for each patient. The lower legend denotes variant type. EGFR, FANCM, RAD51C, PALB2, BRAF and CTNNB1 were also targeted, but no mutations were identified.

Figure S22. Copy number change matrix clustering
A -UMAP dimensional reduction of the copy number change matrix shown in figure 6A. No obvious patterns of patient clustering can be identified. B -Visualisation total within sum of squares calculation for cluster numbers 1 through 10 for k-means clustering. This process should typically identify an "elbow" to select as the optimal number of clusters. C -Re-visualisation of the UMAP dimensional reduction with the purported optimal k-means clusters which demonstrated little to no clustering of patients.

Figure S30. Visualisation of IRL pairwise comparisons within partial ILR model
Tree structure visualisation represents the pairwise comparisons of ratios between various subgroupings of transformed signatures. For example ILR1 is the ratio of s1 to s2, ILR4 is the ratio of the geometric mean of s1-s4 to s5.

Figure S31. Example beta slope and beta intercept plot
Plot demonstrates an example outcome from a set of 3 signatures modelled using the described partial ILR transformation for two groups of simulated signatures (n = 50 & 50, group 1 and group 2,respectively). Left plot is the beta intercept for ILR1 and ILR2 which are the transformed ratios of signatures s1-s3. Right plot is the beta slope for ILR1 and ILR2 which are the transformed ratios of signatures s1-s3. Error bars represent mean ± the standard error of the mean (SEM).